Tales from the Loop

A very quick introduction to looping in R

Jim Rose

Oct 26 2022

Outline

  • What is a programming loop?
  • For loop
    • The “counter index” variable
  • Nested loops
  • Coding loops vs functions
    • Functions inside a loop
    • Looping vs vectorization
  • Some advice on when to use loops

What is a programming loop?

A loop is set of instructions which allows you to apply one or more custom code operations repeatedly throughout a predetermine number of cycles (aka loops)

What is a programming loop?

https://contest-server.cs.uchicago.edu

Neccessary Components of a Programming Loop

  • Iterator or counter

    • marks how many loops have occurred
  • Code to be run repeatedly

    • part of the code doing the actual work!
  • Exit condition

    • determines when to stop looping

Let’s say you wanted to find the mean and standard deviation of each row in this matrix

Code
mymatrix <- matrix(rnorm(15, mean=10, sd=3),
                   nrow=3, ncol=5
                   )
mymatrix
          [,1]     [,2]      [,3]      [,4]      [,5]
[1,]  7.230089 5.421025 11.716285  8.983851  7.364509
[2,]  6.457852 8.559140  9.583536  4.094231 10.182109
[3,] 13.873343 5.879522  8.314007 10.722369  7.558678

The For Loop

for (i in 1:nrow(mymatrix)){
  print(i)
}

The For Loop

for (i in 1:nrow(mymatrix)){
  print(i)
}
[1] 1
[1] 2
[1] 3

Calculating row-wise mean and sd

  • First create an empty output object

  • A object to hold the output must be created OUTSIDE of the loop first or else the loop will cause an error

row_stats <- data.frame(means=rep(NA, nrow(mymatrix)),
                       stdev=rep(NA, nrow(mymatrix)))

row_stats
  means stdev
1    NA    NA
2    NA    NA
3    NA    NA

Then loop for through each row

row_stats <- data.frame(means=rep(NA, nrow(mymatrix)),
                       stdev=rep(NA, nrow(mymatrix)))

for (i in 1:nrow(mymatrix)){
  row_stats$means[i] <- mean(mymatrix[i,])
  row_stats$stdev[i] <- sd(mymatrix[i,])
}
row_stats

Then loop for through each row

row_stats <- data.frame(means=rep(NA, nrow(mymatrix)),
                       stdev=rep(NA, nrow(mymatrix)))

for (i in 1:nrow(mymatrix)){
  row_stats$means[i] <- mean(mymatrix[i,])
  row_stats$stdev[i] <- sd(mymatrix[i,])
}
row_stats
     means    stdev
1 8.143152 2.362414
2 7.775374 2.498139
3 9.269584 3.107975

Another way to write a for loop

  • The counter variable can be named anything you like

  • R will iterate through whatever you declare after the in operator and assign it to the counter variable

dragons <- c("Vhagar","Caraxes","Syrax","Meleys")

for (dragon in dragons){
  print(paste(dragon, "the fearsome", sep=" "))
}
[1] "Vhagar the fearsome"
[1] "Caraxes the fearsome"
[1] "Syrax the fearsome"
[1] "Meleys the fearsome"

Looping to Z-score Normalize

Now let’s use these same calculations to create a z-scored version of the orignal matrix

# First create empty output object
zscored <- matrix(nrow=nrow(mymatrix), ncol=ncol(mymatrix))

for (i in 1:nrow(mymatrix)){
  #Then loop through rows using i to calculate the row statistics
  mean <- mean(mymatrix[i,])
  sd <- sd(mymatrix[i,])
  #Then use these values to normalize each entry in the matrix
  for (j in 1:ncol(mymatrix)){
    zscored[i,j] <- (mymatrix[i,j] - mean)/sd
  }
}
zscored

Looping to Z-score Normalize

Now let’s use these same calculations to create a z-scored version of the orignal matrix

Code
zscored <- matrix(nrow=3, ncol=5)

for (i in 1:nrow(mymatrix)){
  #First calculate the row statistics
  mean <- mean(mymatrix[i,])
  sd <- sd(mymatrix[i,])
  #Then use these values to normalize each entry in the matrix
  for (j in 1:ncol(mymatrix)){
    zscored[i,j] <- (mymatrix[i,j] - mean)/sd
  }
}
zscored
           [,1]       [,2]       [,3]       [,4]       [,5]
[1,] -0.3864956 -1.1522649  1.5124925  0.3558644 -0.3295964
[2,] -0.5274012  0.3137401  0.7238038 -1.4735541  0.9634114
[3,]  1.4812730 -1.0907623 -0.3074597  0.4674380 -0.5504890

Coding with loops and custom functions

Loops can be combined with custom functions to improve simplicity and readability of your code

zscored <- matrix(nrow=3, ncol=5)

my_zscore_fn <- function(x){
  #Input: x, a vector of numeric class
  #Output: z-scored vector
  mean = mean(x)
  sd = sd(x)
  output = (x - mean)/sd
  return(output)
}

for (i in 1:nrow(mymatrix)){
  input <- mymatrix[i,]
  zscored[i,] <- my_zscore_fn(input)
}
zscored

Coding with loops and custom functions

Loops can be combined with custom functions to improve simplicity and readability of your code

Code
zscored <- matrix(nrow=3, ncol=5)

my_zscore_fn <- function(x){
  #Input: x, a vector of numeric class
  #Output: z-scored vector
  mean = mean(x)
  sd = sd(x)
  output = (x - mean)/sd
  return(output)
}

for (i in 1:nrow(mymatrix)){
  input <- mymatrix[i,]
  zscored[i,] <- my_zscore_fn(input)
}
zscored
           [,1]       [,2]       [,3]       [,4]       [,5]
[1,] -0.3864956 -1.1522649  1.5124925  0.3558644 -0.3295964
[2,] -0.5274012  0.3137401  0.7238038 -1.4735541  0.9634114
[3,]  1.4812730 -1.0907623 -0.3074597  0.4674380 -0.5504890

Some advice on when to use loops

Functions are nearly always faster than loops

library(tictoc)

myBIGmatrix <- matrix(rnorm(1500, mean=10, sd=3),
                   nrow=300, ncol=500
                   )
BIGzscored <- matrix(nrow=nrow(myBIGmatrix), ncol=ncol(myBIGmatrix))

tic()
for (i in 1:nrow(myBIGmatrix)){
  #First calculate the row statistics
  mean <- mean(myBIGmatrix[i,])
  sd <- sd(myBIGmatrix[i,])
  #Then use these values to normalize each entry in the matrix
  for (j in 1:ncol(myBIGmatrix)){
    BIGzscored[i,j] <- (myBIGmatrix[i,j] - mean)/sd
  }
}
toc()
0.035 sec elapsed

Some advice on when to use loops

Vectorized functions save compute time with large datasets, but the difference is minimal if you are not dealing with a lot of data points.

library(tictoc)

myBIGmatrix <- matrix(rnorm(1500, mean=10, sd=3),
                   nrow=300, ncol=500
                   )
BIGzscored <- matrix(nrow=nrow(myBIGmatrix), ncol=ncol(myBIGmatrix))

tic()
for (i in 1:nrow(myBIGmatrix)){
  input <- myBIGmatrix[i,]
  BIGzscored[i,] <- my_zscore_fn(input)
}
toc()
0.016 sec elapsed

Happy Looping